When we think of using data to make better business decisions, we think of social networks and the promise of online services customized for every individual. It is easy to overlook many other forms of data that are already being constantly generated through the course of doing business. Consider MasterCard, a company with the data on billions of credit card transactions. How can we apply new techniques and ideas to these existing stockpiles of data to help people make better decisions, whether in combating fraud or determining retail trends? We look into unleashing the hidden potential of existing information silos by transforming them into data warehouses.
Max Levchin will join us in class and discuss how Affirm uses the social graph for loan decisions.
Timeline Oct 7, 2014
3:30 Setting up
3:40 CLASS BEGINS
3:50 Max Levchin, Affirm
4:20 Student breakout (Max to pose a question, students to discuss with their neighbors)
4:30 Discussion Max / Andreas
4:45 Max’s vision for the future of data and society, fairness, data ownership (what would you do with all of the data)
4:50 Summary
5:00 BREAK
Money, Money, Money
Many companies have been revolutionizing the digital payments industry. These are some of the big players as well as up-and-coming startups in the field today.
Affirm
Max Levchin, co-founder of Paypal, is now the founder and CEO of Affirm, a company that aims to allow affordable POS (point-of-sale) financing to those who do not have the necessary credit to do so normally. This is seen as a possible alternative to credit cards, although the mechanism is similar. Payments are still split over time, and can be made via debit card, bank account, or check. The lending decision is not made via credit score, but through a detailed mechanism that can involve thousands of personal qualities in determining whether an individual is likely to pay back the loan.
Klarna
Square
Stripe
PayPal
(also Bill Me Later)
Venmo
Snapcash
Lending Landscape
Background
"Debt is a good thing because it allows you to borrow against your future self or to invest in your more intelligent self." - Max Levchin (refer to video)
Underwriting is simply a data problem. If we know everything about a person, then we can figure out whether or not to trust people and figure out how much we should loan them. Credit cards are built fundamentally on our ability to price risk. This risk is assessed by how likely an individual is to pay back a debt, and is spread over all loans in the form of an interest rate.
Annual Percentage Rate (APR)
The APR (annual percentage rate) of a loan is the amount of interest for a loan over a year's time. Although an APR is simple in concept, there is no simple formula to easily calculate it, and people generally don't understand what it means. In this way, the current system is broken. Individuals, even those who plan on paying off the entire loan, often get into contracts that include interest that they are not capable of keeping up with.
FICO Scores
In short, FICO is essentially a bunch of little things and your debt-to-income ratio - how much you have already borrowed and how much you earn to pay that off. This by itself is actually quite effective in determining the financial credibility of an individual. The problem with FICO scores is that they update at quarterly to semiannually basis, which does not reflect the most up-to-date and relevant information on a person.
Identity Resolution and Fraud Mitigation
When someone comes in to take out a loan, the goal is to figure out if they are who they say they are, i.e. identity resolution of a node in a graph. Part two is the fraud determination or fraud prediction problem -- which is an edge social graph problem. If we know all the transactions and friends of a person, it's easy to pin down the truth about them (if they aren't actually located in Berkeley but are from Turkey). As Max Levchin says, this is not a fully solved problem, but is relatively well contained these days. We are generally able to detect and neutralize bad transactions, but there are still many bad people out there trying to fool the system.
Underwriting and Risk Assessment
Underwriting refers to the process that a large financial service provider (bank, insurer, investment house) uses to assess the eligibility of a customer to receive their products (equity capital, insurance, mortgage, or credit).
Once you can resolve the identity of a person, there are three possible outcomes of giving a loan:
Loan is paid back with no problems.
The loanee means to pay back the loan but can't due to irresponsibility or foolishness in assessment of ability to repay it. The accumulated interest from many responsible loanees offset the irresponsibility of each of these people.
Loanee borrows an appropriate amount of money but an unexpected event stops repayment -- bankruptcy declared, death, etc. This is an unpredictable data problem. There is no way to know if or when events like these happen, but assessing the likelihood of these events is necessary in
In order for a lending firm like Affirm to succeed, there must be criteria that are used to decide who to loan to and with what interest. This essentially amounts to a big data problem: given all of the (legally decidable) data that is out in the world today, what are some of the questions that need to be answered in order to best judge risk in giving a loan?
Is the borrower trying to steal money? Stolen identities are a good clue that the borrower does not intend to return the money (for obvious reasons). There are hundreds of features that can be used in heuristics to solve this problem -- one example is that those who steal identities are less likely to care how their names are spelled (and those who enter a lower-case name are slightly more likely to not attempt to pay back the loan)
How much money are you willing to risk with someone? This is what was originally trying to be solved with the FICO credit scores. However, this model is incredibly simplistic -- there are only tens of factors that are used, and it started with only a handful of them. It mostly revolves around debt-to-income ratio: how much you owe, compared to how much you make in a year. However, FICO updates very infrequently - sometimes only a couple times per year. Now, smarter approximations can attempt to predict a future FICO score, such as when the borrower is soon going to be starting a job (and making money).
How do you predict the likelihood of bad things happening to good people? At one extreme, you can record the actions of every person (eg. like with a camera on their forehead) to try to determine risk. However, this is not feasible, but there is still the problem of unbalanced risk. Inevitably, the least risky people will pay for the most risky people, and this serves as a possible moral hazard.
Novel Data Sources
These days, there are many tells of a person's financial responsibility from sources outside of FICO scores. Every aspect of our life, what we do and how we do it, is somehow related to who we are as a financial individual. Through these small observations, we can collect important data on individuals to gauge their trustworthiness.
Education
Social networking usage
Donation tracking
Responsibility games
Benign self-infected spyware
Sporting behavior
Video game performance
Social FICO Learning Games
Financial Data Revolution
Many companies are revolutionizing the field in their strive for better data and their willingness to navigate obstacles to obtain that data. All financial services companies wind up being built through the fortitude of the people willing to lose tons of money as they learn what kinds of users they have. The goal is to ultimately build systems that can take the identity of a person and output a summary of his or her financial risk. These are some of the companies doing their part in the revolution.
Real time sentiment and breaking news analysis, using social data sources. It provides an aggregate of social media data under one API, and was essentially purchased by Twitter in 2014.
Provides a marketplace for users to sell their personal social data. As of February 2014, will pay $8 per month for access to social data, including Twitter, Facebook, and credit card transactions. They will then sell the insights and correlations found by this data to other corporations. Read more in the data ownership wiki!
Financial Trust and Bitcoin
Bitcoin is a decentralized virtual currency that is based on cryptography.
Mechanics
Unfalsifiable fully distributed ledger. Bitcoin is not really a currency, it's simply a list of records of exchanges of promissory notes measured in Bitcoins. The records are permanent and irreversible by nature of the system -- there will theoretically always be enough people who can validate the ledger in the Bitcoin ecosystem.
Minds of Bitcoin
It is important to note that while the monetary value of Bitcoin is unstable and questionable, there is an intrinsic value created by the intellect spent on the field. Marc Andreessen and Peter Thiel are just some of the brilliant minds that have a very positive outlook on the future of virtual currency and are dedicating many hours in furthering development and adoption.
Future
Whether it be Bitcoin or something else, cryptographic accounting is likely the future. This style of accounting is applicable to any trust-dependent events such as notaries for witness of signature. We are likely to see this methodology extend into other fields in the near future.
MasterCard Lecture
What is big data?
Big data is the aggregation of data created by people, whether voluntary or involuntary. It consists of their actions and their interactions with the people and the world around them.
Big Data for MasterCard
MasterCard has real-time transaction data and history of transactions for 2 billion accounts. They have the billing address of accounts for localization and can see where people spend money and what they spend money on.
Big Data Outcomes
attrition
best customer
next purchase
delinquency
fraud
credit line increase/decrease
retention
mobile user
Transaction Variables
amount/count
time of day
amount over time
amount recurring payments
amount online transactions
purchase sequence
date of first/last transaction
transaction patterns
Data Types
transaction
sku
credit bureau
social
sentiment
location
demographic
Discovery Driven Planning
DDP is a framework used to define, discuss, and test new business plans.
Follow up ideas:
To what degree do loans vary given the identity of a person?
The decision to give a loan is based on the likelihood of the borrower paying the loan plus some interest back. The difficulty in offering a loan is being able to correctly price risk. In order to do that a number of factors are considered including: age, current debt, earning power, credit history, length of loan, etc... All these factors and many more will go into calculating the risk involved with offering you a loan (Julian Prochaska).
Given a FICO score, how do you determine whether or not a person gets the loan?
Remembering that the FICO score is just a measurement of risk determining whether or not someone gets a loan really comes down to whether the lender feels like it is a good risk. This decision process can be broken into three categories 1) FICO score, 2) how much is the loan for and 3) the risk of the loan itself?
The first category (FICO score) is trying to assess how likely in general is the borrower to pay back money they borrow.
The second category deals with what percentage of the lenders money is the person asking to borrow.
The third category deals with why the person needs the money/the risk involved in how they spend it (is it for a house or to start a risky business that will likely fail), the ability of the person to payback the loan, and assets the person has, and how much money can be recuperated if the borrower defaults on their loan.
Since the lender has a finite amount of money it is then up to the lender to determine how risk adverse they want to be with their money (i.e. do they want to lend their money to anyone or only to people they know will pay them back). (Julian Prochaska)
Is whether you actually get a loan binary based on score?
The short answer is no, but there might be base credit scores you need to meet to be considered for specific loans.
One obvious reason that you know it's not binary and a reason some people try to build a good credit score is that your interest rate per loan goes down as your credit score goes up (i.e. when you borrow money the amount of interest you have to pay for that money over time is lower the better credit score you have). While you might need to meet a base credit score to be considered by certain lenders for a loan the amount you will have to pay in interest can change dramatically as your risk profile changes. (Julian Prochaska)
Alternatives to FICO?
List your ideas...
The FICO score is really ripe for disruption. Max Levchin brought up the idea of harnessing social data as an alternative to FICO in class and I've thought a lot about it since. One idea on how to redesign the FICO score is instead of trying to guess your risk based on a list of facts about you, to instead approximate your risk based on how much the people who actually know you would lend to you. By looking at how much the people who know you trust you with their money you could get a much better approximation of how risky a person they are. On top of that/another way to get that data is to build a decentralized micro loaning platform where friends would loan to friends. It would be required for people to rank their friends relatively to their other friends in how much they would loan them and how trust worthy they think they are. From harnessing that data you could see which people are trust worthy/responsible and which aren't. (Julian Prochaska)
TODO: (details for students to add to this page):
What other possible risk signs can you think of that would possibly indicate that the borrower doesn't intend to return the money? What types of data can you look at to help determine this future FICO score? What other problems can you imagine that Affirm needs to solve?
Oct 6 page created by: Matthew Fong (mfong92@berkeley.edu), George Yiu (georgeyiu@berkeley.edu)
Oct 21 part created by: Matthew Fong (mfong92@berkeley.edu), George Yiu (georgeyiu@berkeley.edu)
School of Information | University of California at Berkeley | INFO 290A-03
Table of Contents
Audio: weigend_ischool2014_3.mp3
Transcript: weigend_ischool2014_3.docx
3_Finance
When we think of using data to make better business decisions, we think of social networks and the promise of online services customized for every individual. It is easy to overlook many other forms of data that are already being constantly generated through the course of doing business. Consider MasterCard, a company with the data on billions of credit card transactions. How can we apply new techniques and ideas to these existing stockpiles of data to help people make better decisions, whether in combating fraud or determining retail trends? We look into unleashing the hidden potential of existing information silos by transforming them into data warehouses.Max Levchin will join us in class and discuss how Affirm uses the social graph for loan decisions.
Timeline Oct 7, 2014
3:30 Setting up3:40 CLASS BEGINS
3:50 Max Levchin, Affirm
4:20 Student breakout (Max to pose a question, students to discuss with their neighbors)
4:30 Discussion Max / Andreas
4:45 Max’s vision for the future of data and society, fairness, data ownership (what would you do with all of the data)
4:50 Summary
5:00 BREAK
Money, Money, Money
Many companies have been revolutionizing the digital payments industry. These are some of the big players as well as up-and-coming startups in the field today.
Affirm
Max Levchin, co-founder of Paypal, is now the founder and CEO of Affirm, a company that aims to allow affordable POS (point-of-sale) financing to those who do not have the necessary credit to do so normally. This is seen as a possible alternative to credit cards, although the mechanism is similar. Payments are still split over time, and can be made via debit card, bank account, or check. The lending decision is not made via credit score, but through a detailed mechanism that can involve thousands of personal qualities in determining whether an individual is likely to pay back the loan.Klarna
Square
Stripe
PayPal
(also Bill Me Later)Venmo
Snapcash
Lending Landscape
Background
"Debt is a good thing because it allows you to borrow against your future self or to invest in your more intelligent self." - Max Levchin (refer to video)Underwriting is simply a data problem. If we know everything about a person, then we can figure out whether or not to trust people and figure out how much we should loan them. Credit cards are built fundamentally on our ability to price risk. This risk is assessed by how likely an individual is to pay back a debt, and is spread over all loans in the form of an interest rate.
Annual Percentage Rate (APR)
The APR (annual percentage rate) of a loan is the amount of interest for a loan over a year's time. Although an APR is simple in concept, there is no simple formula to easily calculate it, and people generally don't understand what it means. In this way, the current system is broken. Individuals, even those who plan on paying off the entire loan, often get into contracts that include interest that they are not capable of keeping up with.
FICO Scores
In short, FICO is essentially a bunch of little things and your debt-to-income ratio - how much you have already borrowed and how much you earn to pay that off. This by itself is actually quite effective in determining the financial credibility of an individual. The problem with FICO scores is that they update at quarterly to semiannually basis, which does not reflect the most up-to-date and relevant information on a person.
Identity Resolution and Fraud Mitigation
When someone comes in to take out a loan, the goal is to figure out if they are who they say they are, i.e. identity resolution of a node in a graph. Part two is the fraud determination or fraud prediction problem -- which is an edge social graph problem. If we know all the transactions and friends of a person, it's easy to pin down the truth about them (if they aren't actually located in Berkeley but are from Turkey). As Max Levchin says, this is not a fully solved problem, but is relatively well contained these days. We are generally able to detect and neutralize bad transactions, but there are still many bad people out there trying to fool the system.Underwriting and Risk Assessment
Underwriting refers to the process that a large financial service provider (bank, insurer, investment house) uses to assess the eligibility of a customer to receive their products (equity capital, insurance, mortgage, or credit).Once you can resolve the identity of a person, there are three possible outcomes of giving a loan:
In order for a lending firm like Affirm to succeed, there must be criteria that are used to decide who to loan to and with what interest. This essentially amounts to a big data problem: given all of the (legally decidable) data that is out in the world today, what are some of the questions that need to be answered in order to best judge risk in giving a loan?
Stolen identities are a good clue that the borrower does not intend to return the money (for obvious reasons). There are hundreds of features that can be used in heuristics to solve this problem -- one example is that those who steal identities are less likely to care how their names are spelled (and those who enter a lower-case name are slightly more likely to not attempt to pay back the loan)
This is what was originally trying to be solved with the FICO credit scores. However, this model is incredibly simplistic -- there are only tens of factors that are used, and it started with only a handful of them. It mostly revolves around debt-to-income ratio: how much you owe, compared to how much you make in a year. However, FICO updates very infrequently - sometimes only a couple times per year. Now, smarter approximations can attempt to predict a future FICO score, such as when the borrower is soon going to be starting a job (and making money).
At one extreme, you can record the actions of every person (eg. like with a camera on their forehead) to try to determine risk. However, this is not feasible, but there is still the problem of unbalanced risk. Inevitably, the least risky people will pay for the most risky people, and this serves as a possible moral hazard.
Novel Data Sources
These days, there are many tells of a person's financial responsibility from sources outside of FICO scores. Every aspect of our life, what we do and how we do it, is somehow related to who we are as a financial individual. Through these small observations, we can collect important data on individuals to gauge their trustworthiness.
Financial Data Revolution
Many companies are revolutionizing the field in their strive for better data and their willingness to navigate obstacles to obtain that data. All financial services companies wind up being built through the fortitude of the people willing to lose tons of money as they learn what kinds of users they have. The goal is to ultimately build systems that can take the identity of a person and output a summary of his or her financial risk. These are some of the companies doing their part in the revolution.
Affirm
Assess financial responsibility of people at POS terminals to quickly determine loan credibility using many data sources outside of just FICO scores.Gnip
Real time sentiment and breaking news analysis, using social data sources. It provides an aggregate of social media data under one API, and was essentially purchased by Twitter in 2014.Quandl
A search engine for almost all types of numerical data, including financial and economic data sets. Open data, and free!Datacoup
Provides a marketplace for users to sell their personal social data. As of February 2014, will pay $8 per month for access to social data, including Twitter, Facebook, and credit card transactions. They will then sell the insights and correlations found by this data to other corporations.Read more in the data ownership wiki!
Financial Trust and Bitcoin
Bitcoin is a decentralized virtual currency that is based on cryptography.
Mechanics
Unfalsifiable fully distributed ledger. Bitcoin is not really a currency, it's simply a list of records of exchanges of promissory notes measured in Bitcoins. The records are permanent and irreversible by nature of the system -- there will theoretically always be enough people who can validate the ledger in the Bitcoin ecosystem.Minds of Bitcoin
It is important to note that while the monetary value of Bitcoin is unstable and questionable, there is an intrinsic value created by the intellect spent on the field. Marc Andreessen and Peter Thiel are just some of the brilliant minds that have a very positive outlook on the future of virtual currency and are dedicating many hours in furthering development and adoption.Future
Whether it be Bitcoin or something else, cryptographic accounting is likely the future. This style of accounting is applicable to any trust-dependent events such as notaries for witness of signature. We are likely to see this methodology extend into other fields in the near future.MasterCard Lecture
What is big data?
Big data is the aggregation of data created by people, whether voluntary or involuntary. It consists of their actions and their interactions with the people and the world around them.Big Data for MasterCard
MasterCard has real-time transaction data and history of transactions for 2 billion accounts. They have the billing address of accounts for localization and can see where people spend money and what they spend money on.Big Data Outcomes
Transaction Variables
Data Types
Discovery Driven Planning
DDP is a framework used to define, discuss, and test new business plans.Follow up ideas:
To what degree do loans vary given the identity of a person?
The decision to give a loan is based on the likelihood of the borrower paying the loan plus some interest back. The difficulty in offering a loan is being able to correctly price risk. In order to do that a number of factors are considered including: age, current debt, earning power, credit history, length of loan, etc... All these factors and many more will go into calculating the risk involved with offering you a loan (Julian Prochaska).Given a FICO score, how do you determine whether or not a person gets the loan?
Remembering that the FICO score is just a measurement of risk determining whether or not someone gets a loan really comes down to whether the lender feels like it is a good risk. This decision process can be broken into three categories 1) FICO score, 2) how much is the loan for and 3) the risk of the loan itself?The first category (FICO score) is trying to assess how likely in general is the borrower to pay back money they borrow.
The second category deals with what percentage of the lenders money is the person asking to borrow.
The third category deals with why the person needs the money/the risk involved in how they spend it (is it for a house or to start a risky business that will likely fail), the ability of the person to payback the loan, and assets the person has, and how much money can be recuperated if the borrower defaults on their loan.
Since the lender has a finite amount of money it is then up to the lender to determine how risk adverse they want to be with their money (i.e. do they want to lend their money to anyone or only to people they know will pay them back). (Julian Prochaska)
Is whether you actually get a loan binary based on score?
The short answer is no, but there might be base credit scores you need to meet to be considered for specific loans.One obvious reason that you know it's not binary and a reason some people try to build a good credit score is that your interest rate per loan goes down as your credit score goes up (i.e. when you borrow money the amount of interest you have to pay for that money over time is lower the better credit score you have). While you might need to meet a base credit score to be considered by certain lenders for a loan the amount you will have to pay in interest can change dramatically as your risk profile changes. (Julian Prochaska)
Alternatives to FICO?
List your ideas...
The FICO score is really ripe for disruption. Max Levchin brought up the idea of harnessing social data as an alternative to FICO in class and I've thought a lot about it since. One idea on how to redesign the FICO score is instead of trying to guess your risk based on a list of facts about you, to instead approximate your risk based on how much the people who actually know you would lend to you. By looking at how much the people who know you trust you with their money you could get a much better approximation of how risky a person they are. On top of that/another way to get that data is to build a decentralized micro loaning platform where friends would loan to friends. It would be required for people to rank their friends relatively to their other friends in how much they would loan them and how trust worthy they think they are. From harnessing that data you could see which people are trust worthy/responsible and which aren't. (Julian Prochaska)TODO: (details for students to add to this page):
What other possible risk signs can you think of that would possibly indicate that the borrower doesn't intend to return the money?
What types of data can you look at to help determine this future FICO score?
What other problems can you imagine that Affirm needs to solve?
Oct 6 page created by: Matthew Fong (mfong92@berkeley.edu), George Yiu (georgeyiu@berkeley.edu)
Oct 21 part created by: Matthew Fong (mfong92@berkeley.edu), George Yiu (georgeyiu@berkeley.edu)